Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks

نویسندگان

  • David M. Zajic
  • Bonnie J. Dorr
  • Jimmy Lin
  • Lawrence Moss
  • Richard M. Schwartz
چکیده

Title of dissertation: MULTIPLE ALTERNATIVE SENTENCE COMPRESSIONS AS A TOOL FOR AUTOMATIC SUMMARIZATION TASKS David M. Zajic Doctor of Philosophy, 2007 Dissertation directed by: Professor Bonnie J. Dorr, advisor Professor Jimmy Lin, co-advisor Department of Computer Science Automatic summarization is the distillation of important information from a source into an abridged form for a particular user or task. Many current systems summarize texts by selecting sentences with important content. The limitation of extraction at the sentence level is that highly relevant sentences may also contain non-relevant and redundant content. This thesis presents a novel framework for text summarization that addresses the limitations of sentence-level extraction. Under this framework text summarization is performed by generating Multiple Alternative Sentence Compressions (MASC) as candidate summary components and using weighted features of the candidates to construct summaries from them. Sentence compression is the rewriting of a sentence in a shorter form. This framework provides an environment in which hypotheses about summarization techniques can be tested. Three approaches to sentence compression were developed under this framework. The first approach, HMM Hedge, uses the Noisy Channel Model to calculate the most likely compressions of a sentence. The second approach, Trimmer, uses syntactic trimming rules that are linguistically motivated by Headlinese, a form of compressed English associated with newspaper headlines. The third approach, Topiary, is a combination of fluent text with topic terms. The MASC framework for automatic text summarization has been applied to the tasks of headline generation and multi-document summarization, and has been used for initial work in summarization of novel genres and applications, including broadcast news, email threads, cross-language, and structured queries. The framework supports combinations of component techniques, fostering collaboration between development teams. Three results will be demonstrated under the MASC framework. The first is that an extractive summarization system can produce better summaries by automatically selecting from a pool of compressed sentence candidates than by automatically selecting from unaltered source sentences. The second result is that sentence selectors can construct better summaries from pools of compressed candidates when they make use of larger candidate feature sets. The third result is that for the task of Headline Generation, a combination of topic terms and compressed sentences performs better then either approach alone. Experimental evidence supports all three results. MULTIPLE ALTERNATIVE SENTENCE COMPRESSIONS AS A TOOL FOR AUTOMATIC SUMMARIZATION TASKS

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multiple Alternative Sentence Compressions for Automatic Text Summarization

We perform multi-document summarization by generating compressed versions of source sentences as summary candidates and using weighted features of these candidates to construct summaries. We combine a parse-and-trim approach with a novel technique for producing multiple alternative compressions for source sentences. In addition, we use a novel method for tuning the feature weights that maximize...

متن کامل

Multiple Alternative Sentence Compressions and Word-Pair Antonymy for Automatic Text Summarization and Recognizing Textual Entailment

The University of Maryland participated in three tasks organized by the Text Analysis Conference 2008 (TAC 2008): (1) the update task of text summarization; (2) the opinion task of text summarization; and (3) recognizing textual entailment (RTE). At the heart of our summarization system is Trimmer, which generates multiple alternative compressed versions of the source sentences that act as cand...

متن کامل

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

Single-document and multi-document summarization techniques for email threads using sentence compression

We present two approaches to email thread summarization: Collective Message Summarization (CMS) applies a multi-document summarization approach, while Individual Message Summarization (IMS) treats the problem as a sequence of single-document summarization tasks. Both approaches are implemented in our general framework driven by sentence compression. Instead of a purely extractive approach, we e...

متن کامل

Automatic Generation of Natural Language Summaries

Automatic text summarization has gained much interest in the last few years, since it could, at least in principle, make the process of information seeking in large document collections less tedious and time-consuming. Most existing summarization methods generate summaries by initially extracting the sentences that are most relevant to the user’s query from documents returned by an information ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007